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DATA RECEPTION APPARATUS, DATA RECEPTION METHOD, 
DATA TRANSMISSION METHOD^ AND DATA STOE^GE MEDIA 
FIELD OF THE.. INVENTION 

The present invention relates to data reception appat^tuses, 
data reception methods, data transmission methods, and dat 
storage media. More particularly, the invention relates 
transmission process of transmitting control data including- 
storage location and a playback start time of media data fjifom a 
server distributing the media data, a reception process of 
accessing the server to receive and play the media data, a 
data storage medium having a program for making a computer 
perform the above-mentioned transmission process and recepjt 
process, 

gfiCKGRQPNP. or THE INVENTION 

In recent years, with the advance of compressive 
technology for video data and audio data and the increase 
transmission capacity of netwcirks such as the Internet anc 
wireless networks, we can see services handling data such 
video, audio, text, and the like, which are called media 

These services have conventionally beeix disLributed 
downloading scheme. In the downloading scheme, all of mec 
required for playback arc downloaded from a server to a client 
terminal through a network and, after completion of the 
downloading, playback and display of the media data are p^j^formed 
at the client terminal. 
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Recently^r the services handling the above-mentioned liedia 
data have come to adopt a streaxning scheme instead o£ the 
downloading scheme. In the streaming scheme, reception oj luedia 
data from a server at a client terminal through a network :.s 
performed in parallel with playback and display of the received 
media data at the client teminal. 

Since, in the streaming scheme, playback and display of the 
media data are performed before reception of the media dat a is 
completed/ the most striking characteristic of the stream' :ig 
scheme is rhat a service adopting this scheme can reduce 1 'le 
' latency time from when program data is requested Lo when playback 
and display of the program data are performed even when tits 
service distributes a long-hours program* 

In the future, services, distributing media data as c ascribed 
above will go beyond playback and" display of single media data 
such as video data or audio data, to be extended to servi<:es 
capable of simultaneous playback and display o£ plural pi^^ces of 
media data, such as video data, sVill-picture data, text oata, 
and the like. 

Hereinafter, a description will be given of a procej of 
simultaneously playing plural pieces of media data by the 
streaming scheme to display, for example, one background Mnd two 
foregrounds at the same time. 

Figure 11(a) .is a diagram for explaining the spatia 
arrangement of media data. 
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In figure 11(a), a predetermined image space 1100 i 5? 
rectangle background display region (bg region) 1110 wherel a 
background image (bg) is displayed. In the rectangle baclj^round 
display region 1110, there are a first rectangle foregroui1(Ji 
display region (adv region) 1120 where a first foreground :|.mage 
(adv) that is a picture of an advertisement or the like id placed, 
and a second rectangle foreground display region (mov regi|(J)n) 
1130 where a second foreground image (mov) as a moving pi 
placed. 

For the prf^determined image space 1100, a coordinate 
-indicating the positions in the image space llOO is defindji 
the number of horizontal points corresponding to the numb€ 
pixels in the horisual^il direction and the number of vert 
points corresponding to the number of pixels in the verti^hl 
direction. For example, the upper left corner of the bac 
display region (entire scene) 1110 is in a position where 
number of hori zontal points "hnd the number of vertical po: 
0, The size of the background display region (entire ace 
in the horizontal direction (width) is 300 points, and th* 
of the background display region 1110 in the vertical direct 
(height) is 200 points. The upper left corner of the fir 
foreground display region (adv region) 1120 is in a posit 
where the number of horizontal points is 0 and the number 
vertical points is 150, The size of the first foregroiinc 
region 1120 in the horizontal direction (width) is 300 po 
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and the size of the first foreground display region 1120 
vertical direction (height) is 50 points. The upper left 
of the second foreground display region (mov region) 1130 
position where the number of horizontal points is 50 and t 
nurrfDer of vertical points is 0. The size of the second 
foreground display region 1130 in the horizontal directior 
(width) is 200 points, and the size of the second foregroi^d 
display region 1130 in the vertical direction (height) is |L50 
points , 

Figure 11 (Jo) is a diagram for explaining the teicipora . 
■ arrangement of the media data, showing the tixciings when tl 
background image and the first and second foreground iinag< 
displayed in the predetermined -iittage space. 

In the temporal arrangement of the -media data shown 
figure 11(b), when a reference time T of the client termiAjal 
becomes a display start time Tbg (Tbg=0sec,) of the backgibund 
image, the background image tbg) appears in the image spajije 1100- 
Further, when the reference time V of the client tenninal jbecomes 
a display start time Tadv (Tadv=^5secO of the first foregkjound 
image (adv) , the firsL foreground image (adv) appears in he 
image space 1100, Further, when the reference time T of jtihe 
client terminal becomes a display start time Tmov (Tmov^l sec.) 
of the second foreground image (mov) ^ the second f oregrou|:jd image 
(mov) appears in the .iTnage space 1100. 

In order to actually perform the process of s 
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playing the plural pieces of media data for display at thd client 
terminal, information (scene description data) for combiniMg the 
respective media data (i.e., the background image (bg) , tM* first 
foreground image (adv) , and the second foreground image (tdjiv) ) is 
required. The scene description data designates the tempcjtal 
arrangement (refer to figure 11(b)) and the spatial arrandiment 
(refer to figure 11(a)) of the respective media data. Fuither, 
there is scene description data whose contents are descriJAd with 
a language standardized by W3C (World Wide Web Consortium) such 
as "SMIL (Synchronized Multimedia Integration Language)" <j|: "HTML 
• ^Hyper Text Markup Language) + TIME (Timed interactive Mu3|j:imedia 
Extensions) " . 

Hereinafter, a description will be given of the SMILllas one 
of the languages expressing the scene description data. 

Figure 12 is a diagram for explaining an example of 
of scene description data according to the SMIL. 

Tn figure 17., character^ strings described at the hea| 
the respective rows of the scene description SD, i.e., <sijLl>, 
</smil>, <head>, </head>, <layout>, </layout>, <root-layoi 
<region>, <body>, <par>, </par>, <video>, and the like ar 
"elements", and declare the contents of descriptions whid 

the elements . 

For example, the smil element and the /smil element 
that the rows positioned between the row 710a including t: 
element and the row 710b including the /smil element are 
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described according to the SMIL. The head element and thfj /head 
element declare that the rows positioned between the row 7E0a 
including the head element and the row 720b including the I4head 
element describe information for defining the regions wher^ the 
respective images (bg) , (adv), and (mov) are placed in the image 
space shown in figure 11(a), Further, the layout element ^nd the 
/layout element declare that the rows 701 to 703 including 
information relating to the positions of the background 
the foreground images to be played in parallel with each 
(at the same time) are placed between the row 730a inclTadi|E|g 
layout element and the row 73Qb including the /layout elexrl^nt 
Furthermore/ the root-layout element 701a declares tjijat 
description in the row 701 including this element designatj^ 
image to be displayed as the backgrouncf image (entire sceij^ 
designates the size of the background image. The region 

'■V 

702a (703a) declares that the description in the row 702 
including this element designates the size of one rectanglj^ 
region where the foreground image"^ 'is placed/ and the posit 
the rectangle region in the entire scene (image space) . 

The body element and the /body element declare that 
positioned between the row 740a including the body element 
the row 740b including the /body element describe infcrmat 
(URL) indicating the location of the media data to be plajf^d 
information relating to the time when the media data is tc 
displayed. Further, the par element and the /par element 
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that the rows 704 and 705 including media ©laments and a 
information relating to the media data to be played in 
with each other (at the same time) are grouped and placed 
the row 7 50a including the par element and the row 750b in 
the /par element. 

Each of the video elements 704a and 705a declares 
description in the row including this element designates 
data - 

Furthermore, character strings "id", "width*', "hei 
"left", "top", "src", "begin", and the like which follow 
above-mentioned root-layout element, region element, and ^ 
element are called "attributes", and designate detailed 
information in the rows Including the respective elements 

To be specific, the id attributes' in the rows 701, 
703 including the root-layout element, the region element 
the region element designate the media data, i,e., the 
image, the first foreground Image, and the second 
image , re spect ively . 

Further, the width attribute and the height attribul 
row 701 including the root-layout element 701a designate 
width and the height of the background image (entire seen 
the size of the background (entire scene) is designated s 
the width is 300 points (width=="300") and the height is 2 
points (height="200") , 

Further, the width attribute and the height attribu 
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row 702 (703) including the region element 702a {703a) 
the height and the width of the corresponding rectangle 
and the left attribute and the top attribute designate the 
position of the upper left corner of the rectangle region 
respect to the upper left corner of the entire scene . 

For example, in the row 702 including the region 
the id attribute (id^adv) designates the first rectangle 
1120 (refer to figure 10{a}) where the media data corres 
to the region attribute value (region=adv) is displayed, 
position of the upper left corner of this first rectangle 
"is designated by the left attribute (left^O) and the top 
attribute (top=150), that is, it is set at a distance of ( 
in the horizontal direction and 150 points in the vertica 
direction from the upper left corner of • the image space a 
reference point. Further, the size of this first rectangle 
region is designated by the width attribute (width='300 
height attribute ( heigh t=50)\ that is, the first rectangl^ 
is 300 points wide and 50 points'" long. 

In the row 703 including the region element, the id 
attribute (id=mov) designates the second rectangle region 
(refer to figure 11(a)) where the media data corresponding 
region attribute value {region=mov) is displayed. The p 
of the upper left corner of this second rectangle region 
designated by the left attribute {left=50) and the top at 
(top=0), that is, it is set at a distance of 50 points in 
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horizontal direction and 0 point in the vertical direction 
the upper left corner of the image space as a reference 
Further^ the size of this second rectangle region is des 
by the width attribute {width==200) and the height attribut 
(height==150) , that is, the second rectangle region is 200 
wide and 150 points long. 

The arrangement information described in the row 702 
including the region element is adapted to the media data 
is designated by the region attribute value (region=adv) 
row 704 including the video element, and the arrangement 
-information described in the row 703 including the region 
is adapted to the media data which is designated by the 
attribute value (region=mov) in the row 705 including the 
element . . ^ • 

Further, the src attribute in the roW 704 (705) inc. 
the video element 704a (705a) designates the transmission 
and the storage location of "^he media data on the server 
information designated by the src' attribute is required 
request media data from the server because the SMIL data 
provided with the media data such as video. 

In the row 704 (705) including the video element, 
time streaming protocol), which is a protocol (procedure) 
exchanging a data request message between the transmitting' 
and the receiving end, is designated as a transmission s 
In the row 704 including the video element, data (adv,mpc/ 



pci.nt , 



ic nated 



to 



from 



points 



which 
the 



element 
gion 
v^ideo 



ljuding 
scheme 
The 



s not 



r^:5p 



c : leme . 



(real 
for 
end 



stored 



10 



5ponci 



mg to 
the 

is 

f orec ] round 



5ti 



S5U€d 



in a server (s2.com) is designated as media data corresp 
the first foreground image (adv) . in the row 705 includirj^ 
video element/ data (mov.iapg) stored in a server (s3*coin) 
designated as media data corresponding to the second 
image (mov) . 

Therefore^ at the client terminal, messages reques 
media data (adv.mpg) and the media data (mov.mpeg) are i 
the server (s2.com) and the server {s3.cQin) designated by 
descriptions in the rows 704 and 705 including the video ^ 
respectively, by using the RTSP (Real Time Streaming Protc 
-which is the media data transmission protocol (procedure) 
media data are transmitted and received by using the RTP 
(Realtime Transport Protocol),^ 

Furthermore, the begin .attribute In the row 704 (70^) 
including the video element designates the time to start 
of the media data in the case where the time to start dis 
the scene is a starting poih*t (0 see-). The temporal 
of each media data depends on the 'begin attribute and the 
In the description in the row 704 including the video 
the begin attribute is set at 5 sec. (being«"5") . That 
temporal arrangement of the first foreground image is des 
such that display of this image will be started five 
after display of the scene is started- In the descriptip 
row 705 including the video element, the begin attribute 
at 10 sec, (begin="10") , That is, the temporal arrangeme 
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the second foreground image is designated such that di i^pla^ 
this image will be started ten seconds after display of 
is started. 

Next, a description will be given of a conventional 
reception apparatus mounted on a personal computer as an 
of the above-mentioned client tenuinal. 

Figure 13 is a block diagram for explaining the data 
reception apparatus - 

The data reception apparatus 901 obtains, as scene 
description data, SMIL data shown in figure 11 from the 
and obtains media data designated by the SMIL data from 
server, and performs playback and display of the obtained 
data , 

To be specific, the data reception- apparatus 901 
plurality of data reception units" 902a and 902b for re 
image data (media data) Dml and Din2 corresponding to the 
respective images constitutihg a scene, and outputting th€ 
image data; a plurality of image decoding units 903a and 
decoding the image data Dml and Dm2 outputted from the re 
data reception uniLs 902a and 902b to output decoded imag« 
Ddl and Dd2; a plurality of frame memories 904a and 904b 
storing, in units of frames, the decoded image data Ddl 
supplied from the respective image decoding units 903a an<^ 
and a display unit 905 for receiving the decoded image 
and Dd2 read from the respective frame memories 904a and 
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and cornbining the image data corresponding to the respectif^' 
images to construct one scene, on the basis of control 
and displaying the scene. 

The data reception unit 901 further includes an SMIL 
request /reception unit 906 for outputting an SMIL request 
Srd Lo request SMIL data Ds from a predetermined remote se 
the basis of the third control data Dc3, and receiving the 
data Ds from the remote server to analyze it; a control 
generation unit 907 for receiving SMIL analysis data Da 
by the an^^iysis on the SMIL data, and storing, as first 
data Del, information relating to spatial arrangement and 
temporal arrangement of each image corresponding to each \ 
element, and storing, as second control data Dc2, informal 
relating to a transmission scheme and a storage place for 
image data (media data) corresponding to each image; a da1 

'■V 

request/reception unit 908 for outputting a data request 
Sr to request image data froh the remote server on the ba* 
the control data Dc2 from the data generation unit 907 
an acknowledge signal Sack to the request, and outputting 
obtained from the acknowledge signal, to the data generat. 
907; and a clock circuit 909 for providing the respective 
components of the data reception apparatus 901 with time 
information - 

The data reception apparatus 901 possesses the data 
i^eception units, the image decoding units, and the frame 
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app aratus 



as many as the number of image data (media data) to be recesived 
The data request/reception unit 908 requests scene descrip;t:ion 
data for playing a predetermined scene, according to user 
operation . 

Hereinafter r the operation of the data reception 
901 will be described. 

Figure 14 is a diagram for explaining the flow of a 
procedure by which the data reception apparatus 901 obtairii 
data from the server, illustrating an example of RTSP (Re 
Transport Streaming Protocol) * 

It is assomed that the data reception apparatus 901 
mounted on a personal computer as a client terminal, and 
reception apparatus 901 is supplied with the SMIL data sh< 
figure 12 as scene description data SD! . 

When the user^ who is viewing a home page described 
(Hyper Text Markup Language) using a Web browser installeA 
personal computer, clicks a Region on the home page linke<l 
predetermined SMIL data ds, the ^iata reception apparatus 
the client terminal issues an SMIL request command (GET 
http://bl.coxii/scene.smil) Cl for requesting the SMIL data 
This command Cl requests the server (sl.com) 13a to distr 
the SMIL data by HTTP, 

On receipt of the SMIL request command Cl, the serv? 
issues an acknowledge (HTTP/1,0 OK) indicating that the c 
has been accepted, to the client terminal, and transmits 
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data (scene. sml) Ds to the client terminal. 

In the data reception apparatus 901 of the client 
the SMIL request/reception unit 906 receives the SMIL data 
and analyses the SMIL data Ds. 

The SMIL analysis data Da obtained by the analysis 
SMIL data is stored in the control data generation unit 

That is, the control data generation unit 907 holds 
information relating to the size of the background image 
scene) described as the root-layout element, or informatid^ 
relating to the src attribute, top attribute, left attribi 
'Width attribute, height attribute, and begin attribute, d^ 
as the video element. To be specific, the src attribute 
information includes information indicating the storage pj 
each image data, and each of the top attribute informatioii 
the left attribute information includes information about 
position of the rectangle region where the foreground ima< 
placed, with the upper left'^edge of the scene as a referei 
point- The width attribute information and the height at 
information include information about the size {width and 
of the rectangle region in the horizontal direction and 
vertical direction, respectively. the begin attribute 
information includes a display start time to start displa; 
media data corresponding to each video element. 

The display unit 905 starts the process of creating 
and displaying it, on the basis of the contents stored in 
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control data generation unit 907. To be specific, the bac cground 
image (bg) corresponding to the root-layout eieiuent is ciis]>iayed 
over the image space 1100 upon starting the display procesf 
this time, the time information outputted from the clock c 
909 is set at zero. 

Since ^ in the SMIL data Ds, the display start time o 
first foreground image (adv) is set at five seconds and tnfe 
display start time of the second foreground image (mov) i£ 
ten seconds, the display unit 905 does not pertorm the pre 
combining the image data with reference to the frame memoj 
- 904a and 904b, during the period from 0 second to five se<: 

When the time information outputted from the clock c 
909 becomes 5 seconds, exchange of a message requesting tile 
data (adv^mpg) corresponding to the filcst foreground imag^ 
performed between the data request/reception unit 908 and 
second server (s2-com) 13b, on the basis of the src attritfute 
the video element 704 stored in the control data generati 
907, by using RTSP (Real Time St'^feaming Protocol) as a 
communication protocol. Thereafter, the server transmits the 
image data (adv.mpg) using RTP (Realtime Transport Protocol) . 

To be specific, as shown in figure 14^ the data rec<i?tion 
apparatus 901 of the client terminal issues a command (DESCRIBE 
rtsp://s2,com/aciv.mpg) C2 requesting specific information 
relating to the media data corresponding to the first foreground 
image (adv) (e.g., coding condition, existence of plural 
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candidate data, etc), to t.he second server (s2-coin) 13b- 
On receipt of the command C2, the second server 13b 
an acknowledge (RTSP/1.0 OK) R2 indicating that the comm. 
been accepted, to the client terminal, and transmits SDP 
Description Protocol) information to the client terminal. 

Next, the data reception apparatus 901 of the client 
terminal issues a setup request command (SETUP rtsp://s2 
adv,mpg) C3 which requests the second server (s?. -com) 13b 
up provision of the media data corresponding to the first 
foreground image (adv), to the second server 13b- Upon 
completion of setup for the media data, the second server 
issues an acknowledge (RTSP/1.0 OK) R3 indicating that the 
command C3 has been accepted, -to the client terminal. 

When the data reception apparatus -901 of the client 
issues a data request command {PllAY rtsp: //s2 . com/adv.mpg) 

'■V 

requesting the media data corresponding to the first forec 
image (adv), to the second Server (s2.com) 13b, the seconc 
13b issues an actoowledge {RTSP/V.O ok) R4 indicating tha 
command C4 has been accepted, to the client terminal* 
the second server 13b stores the media data Dml 
the first foreground image (adv,mpg) in RTF packets, and 
successively transmits the RTP packets to the client term 
The media data Dml is received by the corresponding 
reception unit 902a to be output to the corresponding ima| 
decoding unit 9a3a. The image decoding unit 903a decodes 
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media data, and the decoded media data Ddl is stored in th i 
corresponding frame memory 904a in units of frames. At tt 
point of time, playback of the media data Dml becomes pos 
However, three seconds have passed from when the client 
started the request for the media data Dml from the server 
when Lhe output of the counter was five seconds) to when 
client terminal and the server exchange the message. 

In this way, since the client terminal exchanges the 
with the server to obtain the media data from the server, 
time when playback of the first foreground image at the cl 
end becomes possible is behind the display start time ul 
first foreground image described in the SMIL data. 

Therefore, in the display unit 905, the first for 
image is displayed when three seconds have passed from the 
display start time of the first foreground image describee 
SMIL data. 

That is, when the time "^information from the clock 
909 reaches 8 seconds, it is judged whether one frame of 
image data of the foreground image is stored in the frame 
904a or not. When one frame of decoded image* data is s 
first foreground image is superimposed on the background 
for display. 

When the image data are video data, the image data < 
successively input to the data reception unit 902a and 
successively decoded by the image decoding unit BOSix, and 
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and 



decoded image data are successively stored in r.he frame memory 
904a in units of frames. In the display unit 905, the iiti^fe data 
corresponding to the respective frames, which are stored iH the 
frame memory 904a, are successively combined with the data| of the 
background image for display. 

When the time information output ted from the clock cHrcuit 
909 reaches 10 seconds, exchange of a message requesting t^e 
image data (mov.mpg) corresponding to the second foregrourJcl 
is performed between the data request/reception unit 908 
third server (s3.com) 13c/ on the basis of the src attribiit: 
the video element 705a stored in the control data generati 
907, by using RTSP (Real Time Streaming Protocol) as a 
communication proLocol, Thereafter, the server transmits 
image data (mov.mpg) by using RTF (Realtime Transport Protl<t)col) . 

To be specific, as shown in'" figure 14, the data rece 
apparatus 901 of the client terminal issues a command (DEi 
rtsp: //s3 .com/mov,mpg) C5 re^juesting specific information 
relating to the media data corresponding to the second fojjpground 
image (mov) (e.g./ coding condition^ existence of plural 
candidate data, etcO^ to the third server {s3.com) 13c. 

On receipt of the command C5, the third server 13c ifi 
acknowledge (RTSP/1.0 OK) R5 indicating that the command ti^ 
accepted/ to the client terminal, and transmits SDP (Sess 
Description Protocol) information to the client terminal* 
Next, the data reception apparatus 901 of the client 
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et: 



terminal issues a setup request coinmand (SETUP rtsp: //s3 .qcfm/ 
mov.mpg) C6 requesting th.e third server (s'i,com) i3c to s 
provision of the media data (image data) corresponding to 
second foreground image (mov) - Upon completion of setup 
media data, the third server 13c issues an acknowledge ( 
OK) R3 indicating that the command 06 has been accepted, 
client terminal. 

When the data reception apparatus 901 of the client 
issues a data request command (PLAY rtsp: //s3 •com/adv.mpg) 
requesting the media data corresponding to the second 
-image (mov), to the third server (s3-com) 13c/ the third 
13c issues an acknowledge (RTSP/1,0 OK) R7 indicating that 
command C7 has been accepted, to the client terminal* Th 
the third server 13c stores the media data Dm2 correspond: 
the second foreground image (mov^mpg) in RTP packets, and 
successively transmits the RTP packets to the client 

The media data Dm2 is '^received by the corresponding 
j^eception unit 902b to be output'^to the corresponding i 
decoding unit 903b. The image decoding unit 903 decodes 
media data Dm2 in like manner as the decoding process for 
media data Dml, and the decoded media data Dd is stored 
corresponding frame memory 904fa in units of frames. At 
point of time, playback of the media data Dm2 hBCom^s pos 
However, a predetermined time has passed from when the 
terminal started the request for the media data Dm2 from 
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server {x,e./ whf^n the output of the coviiiter was ten s 
when the client terminal and the server exchange the mes 

In this way, since the client terminal exchanges the 
with the server to obtain the media data from the server, 
time when playback of the second foreground image at the 
end becomes possible is behind the display start time of 
second foreground image described in the SMIL data. 

Therefore, in the display unit 905/ the second 
image is displayed when three seconds have passed from the 
display start time of the second foreground image des 
-the SMIL data. 

At this time, in the display unit 905, the bacRgroun(c^ 
(bg) is displayed in the backgxound display region 1110 
image space 1100 (refer to figure 11 (ai)/ the first foreg: 
image (adv) is displayed in the first foreground display : 
1120, and the second foreground image (mov) is displayed 
second foreground display region 1130- That is, a 
image comprising the background xhiage (bg) and the first 
second foreground images (adv and mov) is displayed in th 
ispace 1100. 

However y the conventional data reception apparatus 
which issues the image data request message to the server 
basis of the contents of the SMIL scene description, has 
following drawbacks. 

In the scene description, the begin attribute 
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the first video element 704a indicates that the start tim^ of the 
process to display the first foreground image (adv) is whelA five 
seconds have passed from the display start time of the enti.re 
scene. Further, Lhe begin attribute attached to the secoifjl video 
element 705a indicates that the start time of the process 
display the second foreground image (mov) is when ten secdids 
have passed from the display start time of the entire scer^ 
Therefore, in the conventional data reception apparatus 9C 
mounted on the client terminal (reception terminal), the c^ta 
request message requesting the image data corresponding tc 
-first foreground image is issued to Lhe second server 13b 
five seconds have passed after the display start time of 1 
entire scene, and the data request message requesting the 
data corresponding to the second foreground image is issue 
the third server 13c when ten seconds hav6 passed after th 
display start time of the entire scene. 

In this case, there is"^ a delay from when the client 
requests the image data from the^^>erver to when the image 
from the server becomes displayable at the client tenrdna. 
example, this delay corresponds to the time required for 
message exchange by RTSP between the server and the clien 
terminal, or the time required for handling the command f 
client terminal at the server. 

So, in the conventional data reception apparatus 90^ 
predetermined latency time (in this case, three seconds) 



the 
when 
ae 

Image 
i to 



terminal 
data 
For 

he 

om the 

when a 
as 



22 



passed from the start, time of data request to the server, 
display is perforraed on the basis of the image data stored 
frame memory • 

As the result^ in the data reception apparatus 901, 
difficult to display the media data corresponding to each 
element at the time designated by the scene description, .1 
the time indicated by the begin attribute included in the 
element . 

Further, the time required from the request for the 
data to the storage of the image data in the frame memory 
-on the network condition, the nuiober of messages to exchai 
the like. Thereby, the temporal relationship in position 
between plural image data varies, resulting in difficulty 
maintaining synchronization between the. plural image data 
For example, according to the scene description Sd f 
figure 12, display of the image corresponding to the 
element 7 05 should be started five seconds after display 
first video element 704 is started. However, when the 
when the data reception apparatus 901 requests the image 
from Lhe server to when the image data is actually stored 
frame memory of the apparatus 901 varies due to various 
such as congestion of the network, there is the possibili 
the image corresponding to the video element 705 is not 
after five seconds from when display of the image 
to the video element 704 is started. This situation wil3 
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serious problem when the scene is a composite image compri 
plural image data relating with each other. 

Furthermore, when the media data is transmitted 
network for which a band width (i.e,, a constant data 
transmission rate) is not assured like the Internet, the i 
decoding unit of the data reception apparatus 901 must 
several seconds ten and several seconds until a 
quantity of received image data is stored in the data 
before starting decoding on the received image data. The 
of storing a predetermined quantity of received image date 
data buffer of the data reception unit until decoding on 
image data is started by the image decoding unit is callec 
'*p.rebuf fering" . 

When the prebuffering is not performed, the decoding 
in the data reception apparatvis is easily affected by jitt 
the network (fluctuations in transmission rate) . For e 
when decoding is performed for every predetermined quantity 
image data^ image data to be decoded are not stored by th<t 
to perform decoding, resulting in the state where playbac:: 
image daLa is interrupted. 

Accordingly, when the time required for exchange of 
with the server or prebuffering is considered, the conven 
data reception apparatus 901, which issues a message requ 
each image data to the server at the display time of the 
data described in the SMIL data, cannot perform normal sc 
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•mentioned 



playback according to the scene description* 

Moreover^ an appropriate prebuf f ering time varies fol^ every 
bit stream corresponding to each image data {coded data ofl each 
image data) . Therefore^ the reception terminal (data recqttion 
apparatus) cannot set an appropriate prebuffering time^ rej^ulting 
in the possibility that excess or deficiency of image datcl in the 
buffer of the data reception unit (i»e., overflow or undeijflow of 
the buffer) may occur during decoding on the image data. 
SUMMARY or THE INVENTION 

The present invention is made to solve the above-: 
problems and has for its object to provide a data recept 
apparatus, a data reception method, and a data transmiss 
method, by which playback and display of plural images 
constituting a scene can be started on mimes designated b^ scene 
description data, and playback and display' of image data 4:^n be 
performed without interruption regardless of jitter in th^ 
network - 

It iS another object of Lhe^ present invention to pre 
data storage medium containing a program for making a 
perform data reception according to the above-mentioned dfelta 
reception method- 

Other objects and advantages of the invention will income 
apparent from the detailed description that follows. Th€ 
detailed description and specific embodiments described aie 
provided only for illustration since various additions and 
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modifications within the scope of the invention will be ap 
to those of skill in the art from the detailed description 
According to a first aspect of the present invention 
is provided a data reception apparatus for obtaining media 
which is any of video data, audio data/ and text data, and 
corresponds to plural elements constituting a scene, from 
sources on a network; and playing the obtained media data 
display the scene. This apparatus comprises a first recep 
unit for receiving location information indicating the 
of the data sources having the respective media data- on 
network, first time information indicating the playback 
times of the respective media data, and second time 
for requesting Llie respective media data from the corres 

data source; a time setting unit for setting a data reque 

*■ 

to make a request for each media data to the correspondinc 
source, at a time by a specific time set for each media d, 
earlier than the playback stkrt time of the media data, or 
basis of the first and second time information; a data 
unit for making a request for each media data to the data 
indicating by the location information, at the data reque 
set by the time setting unit; and a second reception unit 
receiving the media data supplied from the data source ac 
to the request from the data request unit. Therefore, pi 
of each media data can be started on time as designated al; 
transmitting end- 
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According to a second aspect of the present inventioji 
the data reception apparatus of the first aspect, the firsjt; 
reception unit receives, as the second time information, 
information indicating a latency time Irora when each media 
is received to when the media data is played; and the time 
setting unit sets the data request time for each media 
time by the latency time earlier than the playback start t 
the media data* Therefore, the data reception apparatus 
obtain each media data from a predetermined data source or 
network, within the latency time before playback of the 
data- Furthermore, by setting the latency time at a suff: 
large value considering the condition of the network 
which the media data is transxnitted {e.g., band width, coi 
etc.)/ playback of the media, data by tHe data reception 
is hardly affected by jitter in the network, thereby 
the image display from being interrupted during playback 

media data . ^ 

According to a third aspect of the present inventior 
data reception apparatus of the first aspect, the first 
unit receives, as the second time information, time i 
indicating a time to make a request for each media data 
corresponding dat:a source; and the time setting unit sets 
data request time for each media data, at the time indica 
the second time information. Therefore, the data recepti 
apparatus can obtain each media data from a predetermined 
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source on the network, within the time from the data request time 
to the playback start time. Furthermore, by setting the cata 
request time at a time sufficiently earlier than the data 
playback start time considering Lhe condition of the netwcjik 
through which the media data is transmitted {e.g., band wijcflth, 
conges Lion, etc.)/ playback of the media data by the data 
reception apparatus is hardly affected by jitter in the nd|:work, 
thereby preventing the image display from being interrupte 
during playback of the media data. 

According to a fourth aspect of the present inventiorl, xn 
the data reception apparatus of the first aispect, the firs 
reception unit receives, as the second time information, l[p.me 
information indicating a latency time from when each medil data 
is received to when the media data is flayed; and the tixno 
setting unit sets the data request time for each media da^:ja, at a 
time by the sum of the latency time and a predetermined tipie 
earlier than the playback s-^art time of the media data* 
Therefore, playback of each media' data can be started on tlime as 
designated at the transmitting end. Further, playback o^ media 
data at the receiving end is hardly affected by jitter in the 
network, thereby preventing image display from being interrupted 
during playback of the media data. 

According to a fifth aspect of the present inventiouL in the 
data reception apparatus of the first aspect, the first rjE ception 
unit receives, as the second time information, time inf or nation 
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indicating a time to make a request for each media data to| the 
corresponding data source; and the time setting unit sets 
data request time for each media data, at a time by a 
predetermined time earlier than the time indicated by the 
time information. Therefore, playback of each media data 
started on time as designated at the transmitting end 
playback of media data at the receiving end is hardly 
jitter in the network, thereby preventing image display tx 
being interrupted during playback of the media data. 

According to a sixth aspect of the present invention 
is provided a data reception method for obtaining media da 
which is any of video data, audio data, and text data, anc 
corresponds to plural elements . constituting a scene, from 

sources on a network, and playing the Obtained media data 

*, 

I. 

display the scene. This method comprises 'a first recepti^m 
of receiving location information indicating the location; 
data sources having the respective media data on the netwo 
first time information indicaring 'the playback start timet; 
respective media data/ and second time information for 
the respective media data from the corresponding data sou 
data request step of making a request for each media data 
data source indicating by the location information, at a 
a specific time set for each media data earlier than the 
start time of the media data, on the basis of the first ajjjd 
second time information; and a second reception step of receiving 
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the media data supplied from the data source according to |tihe 
request made in the data request step. Therefore, playbacjlc of 
media data corresponding to each of elements constituting |4 scene 
can be started on tiitie as designated at the transmitting efcid. 

According to a seventh aspect of the present inventitin, In 
the data reception method of the sixth aspect, the first 
reception step receives/ as the second time information, tlime 
information indicating a latency time from when each medi^ data 
is received to when the media data is played; and the dats 
request step makes a request for each media data to a 
predetermined data source, at a time by the latency time ^^rlier 
than the playback start time of the media data. Therefor^, the 
receiving end can obtain each media data from a predete 
data source on the network, within the 'latency time before 
playback of the media data. Furthermore, by setting the 
time at a sufficiently large value considering the condit: 
the network through which tlife media data is transmitted (< 
band width, congestion, etc*)/ playback of the media data 
receiving end is hardly affected by jitter in the network 
thereby preventing the image display from being interrupted 
during playback of the media data. 

According to an eighth aspect of the present invent:|0n/ in 
the data reception method of the sixth aspect, the first 
reception step receives, as the second time information, tjxme 
information indicating a data request Lime to make a requsst for 
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each media data to the corresponding data source; and the 
request step makes a request for each media data to the 
source, at the data request time. Therefore, the receivirj^ 
can obtain each media data from a predetermined data sources 
the network, within the time from the data request time tc 
playback start time . Furthermore, by setting the data 
time at a time sufficiently earlier than the data playback 
time considering the condition of the network through whicjlli 
media data is transmitted (e.g,, band width, congestion, 
playback of media data at the receiving end is hardly a 
jitter in the network, thereby preventing the image displc 
being interrupted during playback of the media data 

According to a ninth aspect of the present invention 
data reception method of the sixth aspebt, the first recej 
step receives, as the second time '"information, time in 
indicating a latency time from when each media data is rec 
to when the media data is plciyed; and the data request 
a request for each media data to a* predetermined data sou] 
a time by the sum of the latency time and a predetermined 
earlier Lhan the playback start time of the media data. 
Therefore, playback of each media data can be started on 
designated at the transmitting end- Further, playback of 
data at the receiving end is hardly affected by jitter in 
network, thereby preventing image display from being inte; 
during playback of the media dat^. 
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According to a tenth aspect of the present invention^ in the 
data reception method of the sixth aspect/ the first recepjtjion 
step receives^ as the second time information, time in 
indicating a data request time to make a requeaL for each 
data to the corresponding data source; and the data reques 
makes a request for each media data to the data source, at 
by a predetermined time earlier than the data request time 
Therefore; playback of each media data can be started on ' 
designated at the transmitting end. Further, playback of 
data at the" receiving end is hardly affected by jitter in 
network/ thereby preventing image display from being intei 
during playback of the media data. 

According to an eleventh aspect of the present invent 
there is provided a data transmission method for transmit"* 
media data which is any of video ciata, audio data, and 
and corresponds to plural elements constituting a scene. 
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scene. This method comprises a first transmission step 
transmitting location information indicating the location^ 
data sources having the respective media data on a networ 
time information indicating the playback start times of 
respective media data, and second time information for requesting 
the respective media data from the corresponding data 
and a second transmission step of transmitting the media 
the reception terminal, according to the request lor the 
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data which is issued from the reception terminal on the ba 
the first and second time information and the location 
infcnnation. Therefore, the receiving end can obtain each 
data from a predetermined data source on the network, on 
basis of the second time information, before playback of 
media data, to b-Lart: playback of the media data on time as 
designated at the transmitting end. 

According to a twelfth aspect of the present invent! 
the data transmission method of the eleventh aspect, the 
time information is time information indicating a latency 
from when each media data is received to when the media dc 
played- Therefore, the receiving end can obtain each 
trom a predetermined data source on the network, within tl 
latency time before playback. of the media data, Furthermojr 
setting the latency time at a sufficiently large value 

■■V 

considering the condition of the network through which thv, 
data is transmitted (e.g., b'knd width, congestion, etc*)/ 
playback of the media data at the 'receiving end is hardly 
affected by jitter in the network, thereby preventing the 
display from being interrupted during playback of the med 

According to a thirteenth aspect of the present i 
in the data transmission method of the eleventh aspect, 
second time information is time information indicating a 
request time to make a request for each media data to the 
corresponding data source. Therefore, the receiving end 
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obtain each media data from a predetermined data source on the 
network, within the time from the data request time to the 
playback start time* Furthermore, by setting the data re<j^{ie 
time at a time sufficiently earlier than the data playback 
time considering the condition of the network through whicyi 
media data is transmitted (e,g«/ band width, congestion, 
playback of media data at the receiving end is hardly 
jitter in the network, thereby preventing the image displ 
being interrupted during playback of the media data. 

Ancording to a fourteenth aspect of the present invejrition 
there is provided a data storage medium containing a data 
playback program to make a computer perform a data playbac: 
process of obtaining media data which is any of video dat 
data, and text data, and corresponds to .plural elements 
constituting a scene, from data stiurces oh a network, and 
the obtained media data to display the scene • This data 
program comprises a first pfbgram to make the computer pe 
first process of receiving location information indicating 
locations of the data sources having the respective media 
first time information indicating tho playback start time 
respective media data, and second time information for re 
the respective media data from the corresponding data sources 
second program to make the computer perform a second proc^j 
making a request for each media data to the data source 
indicating by the location information, at a time by a sp|^ci 
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time set for each media data earlier than the playback s 
of the media data, on the basis of the first and second 
information; and a third program to make the computer perf 
third process of receiving the media data supplied from th 
source according to the data request. Therefore, the rece 
end is permitted to perform, by software, the process of 
media data corresponding to each of elements constituting 
on time as designated at the transmitting end* 

According to a fifteenth aspect of the present inven 
there is provided a data storage medium which contains a 
-transmission program to make a computer perform a data 
transmission process of transmitting media data which is 
video data, audio data, and text data and corresponds to 
elements constituting a scene, to a reception terminal for 
playing the media data to displa/' the scene. This data 
transmission program comprises a first program to make thn 
- computer perform a first pr6t;ess of transmitting location 
information indicating the locations ol data sources 
respective media data on a network, first time informatioii 
indicating the playback start times of the respective med l 
and second time information for requesting the respective 
data from the corresponding data sources; and a second pr 
make the computer perform a second process of transmitting 
media data to the reception terminal, according to the 
for the media data which is issued from the reception 
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the basis of the first and second time information and the 
location information. Therefore, the transmitting end is 
permitted to perform, by software, the process of transmitting 
each media data to the receiving end so that playback of the 
media data at the receiving end is performed on time as 
designated fay the transmitting end* 
BRIEF DES CRIPTTON OF THE DRAWINGS 

Figure 1 is a block diagram for explaining a data reJ:|eption 
apparatus according to a first embodiment or the present 
invention. 

Figure 2 is a diagram illusLrating the contents (scene 
description) of SMIL data supplied to the data reception 
apparatus of the first embodiment- 

Figures 3 (a) and 3(b) are diagrams illustrating the Spatial 
arrangement (3(a)) and the tempoifal arrangement (3(b)) of tiedia 
data on the basis of the SMIL data supplied to the data r^|ceptlon 
apparatus of the first embodiment. 

Figure 4 is a diagram illustrating a time table whidh is 
created by a control data recording unit 103 included in Rhe data 
reception apparatus of the first embodiment* ■ 

Figure 5 is a diagram for explaining the flow of pr4>j::edure 
to obtain media data from a server by the data reception 
apparatus of the first embodiment - 

Figure 6 is a flowchart illustrating the process of 
calculating the time to issue a media data request commaiM, in 
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the data reception apparatus of the first embodlTuent . 

Figure 7 is a block diagram for explaining a data retj^ption 
apparatus according to a second embodiment of the present 
invention . 

Figure 8 is a diagram illustrating the contents (sce|i|e 
description) of SMIL data supplied to the data reception 
apparatus of the second embodiment. 

Figure 9 is a diagram for explaining, as a data tranfehission 
method according to the present invention, a method of 
transmitting information indicating a prebuffering time (ij^tency 
time), which information is included in SDP, 

Figure 10 is a diagram for explaining, as a data 
transmission method according to the present invention, a iaethod 
of transmitting information indicating* a prebuffering timi 
(latency time) , which informatioif is included in an ackno^M^^S^ 
to a SETUP request of RTSP. 

Figures 11(a) and 11 (b> are diagrams illustrating ti: 
spatial arrangement {11(a)) and ^he temporal arra^lgement |[ll(b)) 
of media data on the basis of SMIL data supplied to a 
conventional data reception apparatus . 

Figure 12 is a diagram illustrating the contents (sdfene 
description) of the SMIL data supplied to the conventional] data 
reception apparatus. 

Figure 13 is a block diagram for explaining the con^ntional 
data reception apparatus* 
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Figure 14 is a diagram for explaining the flow of prcttedure 
to obtain media data from a server by the conventional dat 
reception apparatus. 

DETAILED DESCRTPTTON UK 'VHK PREFER RED EMBODIMENTS 

[Embodiment 1] 

Figure 1 is a block diagram for explaining a data reibption 
apparatus 110 according to a first embodiment of the presectt 
invention . 

The data reception apparatus 110 receives SMIL data 
sc*5ne description data, reproduces a composite image compt 
-one background image and two foreground images ou Lhe basi 
the contents of the SMIL data, and displays the composite 

To be specific/ the data -reception apparatus 110 incl 
SMIL request/reception unit 102, and a 'control data generc 
unit 110a, The SMIL request /reception unit 102 outputs a4 
request signal Srd for requesting a predetermined server 
transmit SMIL data Dsl on th^ basis of third control data 
receives the SMIL data Dsl supplied from the server, and 
the SMIL data Dsl. The control data generation unit 110a 
generares first and second control data Del and Dc2 on th^ 
of analysis data Dal obtained by the analysis on the SMIL 
the SMIL request/reception unit 102. 

The data reception apparatus 110 further includes a 
data reception unjt 106a for receiving image data (media 
Dml corresponding to a first foreground image from Lhe se ver; a 
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plural items, each comprising a control command to be output as 



decoding unit 107a for decoding the received image data Dml] to 
output decoded image data Ddl; and a frame memory 108a for 
storing the decoded image data Ddl in units of frames • Fd:|:ther, 
the data reception apparatus 110 includes a media data rcc:jption 
unit 10 6b for receiving image data (media data) Dm2 corresdonding 
to a second foreground image from the server; a decoding unit 
107b for decoding the received image data Dm2 to output decoded 
image data Dd2; and a frame memory 108b for storing the decoded 
image data Dd2 in units of frames. 

Furthermore, the data reception apparatus 110 includes a 
display unit 109 for reading the decoded image data Ddl aiii Dd2 
respectively stored in the frame memories 108a and 108b, the 
basis of the first control data Del supplied from the con1 Jtrol 
data generation unit 110a, combining these data with a bac^kground 
image to generate a composite image, and displaying the c<^fr\posite 
image- The data reception apparatus 110 further includes a data 
request/reception unit 105 for outputting a data request i^jignal 
Srp for requesting data from a predetermined server on thfel basis 
of the second control data Dc2 supplied from the control Hata 
generation unit 110a, and receiving an acknowledge signal Sack to 
the data request from the server 

The control data generation unit 110a comprises a c<j)|itrol 
data recording unit 103, and a trigger signal generation 
The control data recording unit 103 creates a time table 
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control data to the data request/reception unit 105 and th 
display unit 109 and information relating to the command/ 
arranged in order of time to execute the coinmand, on the b 
the analysis data Da from the SMIL request/reception unit 
and outputs time information It relating to the execution 
each control command in order of time. On receipt of the 
information It, the trigger signal generation unit 104 set 
execution time of the control command corresponding to 
to start clocking operation, and outputs a trigger signal 
the control data recording unit 103 every time the clock 
reaches the set control command execution time* Every 
control data recording unit 103 receives the trigger si 
from the trigger signal generation unit 104, the unit 103 
the corresponding control command to the data reques 
unit 105 or the display unit 109 as the control data Del 
In figure 1, reference numeral 101a denotes a clock 
for supplying a reference clock to each component of the 
reception apparatus 110, and this is identical to the 
circuit of the conventional data reception apparatus 901 

The trigger signal generation unit 104 may be i 
a timer which is able to set plural times, performs clockf. 
operation on the basis of the reference clock from the cl 
circuit 101a, and outputs a trigger signal every rime the 
time reaches the set time. 

In this first embodiment, the data reception 



re 

sis of 
02, 
ttime of 
ime 
the 
item 
^t to 
;-me 
the 
St 
outputs 



t/rec^ption 
Air Dc2 . 
c:ircuit 
data 
clo<Jk 



implerji^nted by 
ng 
ck 

clock 



appara^iULS 110 



40 



displ ly 



obtair sd 



includes two data reception unit, two decoding unit, and tjfo 
frame memories, obtains media data corresponding Lo two 
foreground images from the server on the network, and comb . 
the two foreground images on one background image to 
composite image. However, the number of media data 
the server on the network is not restricted to two. For 
in the case where the data reception apparatus 110 obtains 
or more media data from the server on the network, the ap; 
110 is provided with data reception unit, decoding unit, 
frame memories as many as the number of the media data to 
obrained. 

Figure 2 is a diagram illustrating an example of 
the above-mentioned SMIL data, ^ and the data reception 
110 of this first embodiment- receives the SMIL data shown 
figure 2. Figures 3(a) and 3(b) "illustrate the spatial 

■■V 

arrangement and the temporal arrangement of media data as 
contents of the SMIL data shown in figure 2, respectively 
In figure 2, character strings <smil>, </3mil>, <hec 
</head>, <layout>, </layout>, <root-layout>, <region>, <b 
<par>, </par>/ <video> which are described at the heads o 
respective rows of scene description SDl are called "elements 
and declare the contents of description following the re 
elements. That is, elements 210a, 210b, 220a, 220b, 23Gc 
240a, 240b, 250a, 250b shown in figure 2 are identical tc 
elements 710a, 710b, 720a, 720b, 730a, 730b, 740a, 740b, 
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750b shown in figure 12, respectively • Farther, rows 201t|203 
shown in figure 2 are identical to the rows 701 703 showrj in 
figure 12, respectively. However, rows 204 and 205 shown in 
figure 2 are different from the rows 704 and 705 shown in |]|igure 
12, respectively. 

First of all, the spatial arrangement of media data 
designated by the SMIL data will be described with ret 
figure 2, 

The root-layout element 201a designates the size of 
entire scene. That is, the root-layout element 201a i 
the size of the rectangle region where the entire scene Li 
displayed, that is, it indicates that the width and the h^^ 
the rectangle region are 300 points and 200 points, r 
by the width attribute {width="300") arid the height 
(height-'"200") attached to this element. Further, the id 
attribute relating to this element 201 shows the backgrou:j:jd 
(bg) (id="bg"). 

The region element 202cl indicates the sire of the 
region where an image corresponding to this element 202 i 
displayed, that is, it indicates that the width and the h 
the rectangle region are 300 points and 50 points, re 
by the width attr-ihutre (width=*'300" ) and the height a 
(heights" 50") attached to the region element 202a. Fuxtl 
region element 202a indicates, by the left attribute (le 
and the top attribute (top«"150") attached to the region 
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202a, that the upper left edge of the rectangle region is 
positioned at a distance of 0 point from the left edge of 
image space 1100 and 150 points from the upper edge of the 
space 1100. FurLheX/ the id attribute attached to this el 
202a indicates the first foreground image (adv) (id="adv") 
The region attribute attached to the video element 2 
indicates the first foreground image (adv) {region=*"adv" ) . 

Accordingly, the rectangle region whose size and 
are designated by the region element 202a is a region wheij^ 
first foreground image (adv) is placed (hereinafter also 
i:o as an adv region)- 

The region element 203a indicates, by the width attrti 
(width-'*200") and the height attribute (height^^lSO") atta 
this element, that the width. and the height of the corre 
rectangle region are 200 points and 150 points, res 
Further, the region element 203a indicates, by Che left a 
[left-"50") and the top attribute (top-="0") attached to 
element, that the upper left edge of this rectangle regioj: 
positioned at a distance of 50 points from the left edge 
image space 1100 and 0 point from the upper edge of the 
space 1100. The id attribute attached to this element 2C|| 
indicates the second foreground image (mov) {id-"mov") . 

The region attribute attached to the video element : 
indicates the second foreground image (mov) (region="mov" 
Accordingly, the rectangle region whose size and po 
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are designated by the region element 203a is a region whe 
second foreground image (mov) is placed (hereinafter also 
referred to as a mov region) . 

The bg region is a region as a background, the adv 
a region where an advertisement or the like is displayed, 
mov region is a region where a moving image or the like is 
displayed. 

Consequently, as shown in figure 3(a), the positions 
adv region 1120, mov region 1130^ and bg region 1110 basec 
scene description SDl are identical to the positions of tt 
"regions shown in figure 11(a)* 

More specifically, the predetermined image space 
background display region (bg ^region) 1110 where the bac 
image (bg) is displayed* The first foreground display re 
(adv region) 1120 where the first foreground image (adv) 
an advertisement is placed and the second foreground 
region) 1130 where the second foreground image (mov) as a 
picture are placed in the background display region 1110. 
sizes of the regions where the respective images are 
Lheir positions in the image space are identical to those 
in figure 11 (a) . 

Next, a description will be given of the temporal 
arrangement of the media data designated by the SMIL data 
in figure 2. 

The begin attribute (begin^"5s") relating to the vifileo 



rogion xs 
and the 



regiocj 



plac 2d 



the 



of the 
on the 
<5se 

11C*| is the 
k<[ round 
jion 
i;|uch as 
mov 
moving 
The 
and 
shown 



shown 



44 



element 204a indicates that display of the image data 
corresponding to this element 204a should be started five seconds 
after scene display is started. 

The scr attribute (scr="rtsp: //s2 ,com/adv.Tnpg") rela^^ng to 
the video element 204a indicates that the image data 
corresponding to this video element 204a should be obtainefj by 
issuing a command requesting the server (52 •com) to transmit the 
image data (adv.mpg) stored in this server, by using RTSP. 

On the other hand, the begin attribute (begin=«"10s") 
relating to the video element 205a indicates that display 
image data corresponding to this element 205a should be st 
ten seconds after scene display is started. 

The scr attribute (scr'=="rt3p://s3 .com/mov.mpg") relat 
the video element 205a indicates that t>i-e image data 
corresponding to this video element 205a should be obtaine 
issuing a command requesting the server (s3.com) to transmplt 
image data (mov.mpg) stored in this server, by using RTSP 

Consequently, as shown in figure 3(b), display of 
foreground image (adv) is started when five seconds have 
from the start of scene (background image) display, and di 
of the second foreground image^ (mov) is started when ten 
have passed from the start of scene display. 

Further, each of the video elements 204a and 205a haH a 
prebuffering attribute. The prebuffering attribute indic4ltes the 
latency time from reception of the media data to decoding liin it . 
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For example^., the prebuffering attribute (prebuf f ering="7s" 
relating to the video element 204a indicates that the 
(adv.mpq) corresponding to the video element 204a should 
seven seconds lor decoding after it is received by the 
reception apparatus. The prebuffering attribute 
(prebuf fering="155") relating to the video element 205a 
that the image data (mov.mpg) corresponding to the video 
205a should wait fifteen seconds for decoding after it is 
received by the data reception apparatus. 

In the data reception apparatus 110 according to thi 
-embodimGnt/ when the scene description data GDI is receive- 
time table considering the latency times for the respecti^ 
elements is created and stored^ in the control data general 
unit 110a- 

On this time table, the times to issue control comm, 
set so that receptions of the image data (adv.mpg) and 
corresponding to the video element 204a and 205a are star 
seconds and five seconds before start of scene display, 
respectively, and displays of the image data (adv.mpg) 
(mov.mpg) are started at times Tadv (Tadv=5sec,) and Tmov 
{Tmov=lQsec, ) after the latency times of seven seconds an 
fifteen seconds have passed from start of receptions of t 
elements 204a and 205a, respectively- 

Figure 4 shows the contents of a time table to be 
the control data generatioxi unit 110a as the contents of 
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data . 

The time table Tab has an item indicating the time tt 
perforin data request or data display, an item indicating trie data 
request/reception uniL 105 or the display unit lOD as a ccjrftrol 
target to which a control command is issued, and an item 
indicating the control command to the control target . A 
plurality of events, each having the items of time, contrd 
target, and control command, are listed in chronological c::der- 
With respect to the event whose control target is the datq 
request/reception unit 105, information designated by the ^rc 
attribute relating to the video element of the SMIL data A 
described in the item of the control command. Further, w,.|th 
respect to the event whose coivtrol target is the display u)iit, 
information designated by the id, width-, height, left, and top 
attributes relating to the root-layout element or the regj.|on 
element of the SMIL data is described in the item of the ibntrol 
command . ^ 

Hereinafter, the operation of the data reception apjj^ratus 
110 will be described 

Figure 5 is a diagram for explaining the flow of a j tocedure 
by which the data reception apparatus 110 obtains media dslta from 
the server- More specifically/ figure 5 illustrates excq^nge of 
messages between the data reception apparatus and the sexWer, and 
transmission of media data from the server to the data reception 
apparatus - 
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It is assumed that the data reception apparatiss 110 :^js 
mounted on a personal computer as a client terminal, and l\\e data 
reception apparatus 110 is supplied with SMIL data Dsl as Hata 
indicating the scene description data SDl shown in figure 

When the user, who is viewing a home page described tjy HTML 
(Hyper Text Markup Language) using a Web browser installed on the 
personal computer, clicks a region on the home page linkec to a 
predetermined scene description SDl (user operation) , the <Sata 
reception apparatus 110 of the client terminal issues an ^IL 
request command (GET http: //si, com/scene -smil) CI requesti|:ig SMIL 
data Dsl indicating the scene description SDl- This commajnd CI 
requests the server (sl.com) 13a to distribute the SMIL dc,|:a by 
HTTP, 

On receipt of the SMIL- request command CI, the servdi I'Ja 
issues an acknowledge (HTTP/1,0 OK) Rl indicating that thf 
command CI has been accepted, to the client terminal, and 
transmits the SMIL data ( scene. sml) Dsl to the client ten^nal 

In the data reception apparatus 110 of the client t^lrminal, 
the SMIL request/reception unit 102 receives the SMIL dat 3| Dsl, 
and analyses the SMIL data Dsl 

The SMIL analysis data pal obtained by the analysis bn the 
SMIL data is transmitted to the control data generation xiHit 110a 
to be stored in the control data recording unit 103 » Theiji, the 



control data recording unit 103 creates the time table Ta:> 



in figure 4 on the basis of the SMIL analysis data Dal, hereby 
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the contents of nhe SMIL data are stored in the form of a [tjime 
table . 

Hereinafter, the process of creating the time table ]|>|y the 
control data recording unit 103 will be described briefly 

Initially, in the control data recording unit 103, ttls time 
to issue a control command for requesting media data 
corresponding to each video element is obtained by using tfie 
display start time indicated by the begin attribute of e 
element^ and the latency time (prebuf f ering time) indicatejdl 
the prebuf f ering attribute of each video element. The tiii? 
issue a control command requesting media data is obtained 
subtracting the latency time from the display start time, 
specific, the time Tpadv to is'sue a ^control command reques 
the media data {adv.mpg) corresponding* to the video elemer"j: 
is -2 sec, with reference to the scene display start time 
(Tbg^Osec), and the time Tpmov to issue a control commanc 
requesting the media data (mov.mpg) corresponding to the 
element 205a is -5 sec- 

Thereafter, in the control data recording unit 103^ 
basis of the SMIL analysis data Dal, the contents of the 
data are sorted into two groups, i-e./ information requir« 
request the media data (information designated by the src 
attribute included in the video element) , and information 
required to display the media data (information designated by the 
id, width, height, left, and top attributes included in th^ root- 
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layout element or the region element) . 

Next, in the control data recording unit 103, event 
is created on the basis of the information to display the 
data, which event data comprises information indicating a 
command to request the media data (mov-mpg), information 
indicating the data request unit as a target of the centre 
command^ and information indicating the time to issue the 
command. Further, event data £2 is created, which compri 
information indicating a control command to request tlie me 
data (adv.mpg)/ information indicating the data request ur 
' target of the cuiiLrol command/ and information indicating 
time to issue the control command , 

Furthermore, on the basis of the information to reqdi 

A- 

media data, the following event data ar-e created: event da 
comprising information indicating a control command to 
the bac)<:ground image (bg) , information indicating the dis 
unit as a target of the control command, and information 
indicating the time to issue the control command; event d 
comprising information indicating a control command to di 
the first foreground image (adv), information indicating 
display unit as a target of the control command, and i 
indicating the time to issue the control command; and 
E5 comprising information indicating a control command tc 
the second foreground image (mov) , information indicatinc 
display unit as a target of the control command, and infc 
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indicar.ing the time to issue the control coiratiand. 

Thereafter, in the control data recording unit 103, 
respective event data are arranged according to the co 
control command issue times (chronological order) to ere 
time table shown in figure 4, and the time table so creat^f 
stored. 

To be specific/ in the scene description SDl shown i 
2f the times to request the media data (adv.mpg) and {mov 
corresponding to the video elements 2D4 and 205 (the times 
issue control commands) are set at -5 sec- and -2 sec, 
respectively. The display start times of the foreground 

(adv) and (mov) are set at 5 sec* and 10 sec, respective 
the display start time of the ^ scene (background image) is 
Accordingly, as shown in figure 4, on 'the time table s 
the control data recording unit 103, the first event data 
event data El, the second event data is the event data E2 
third event data is the event data E3, the fourth event d;i 

the event data E4, and the fifth event data is the event 

Thereafter, the control data recording unit 103 out{J>jit 

information indicating the issue times of the respective 

commands (time information}, .from the time table to the s 

generation unit 104 in descending order. 

When the time information is input to the signal gene 

unit 104, the time indicating the time information is rec 

a set time in order of reception, and the clock starts cl 
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operation. 

At this time, in the data reception apparatus 110, 
simultaneously with the creation of the time table ^ the da 
reception apparatus and a predetermined server excliauyt; m*: 
so as to set up transmission of image data at the server 

To be specific/ as shown in figure 5, the data rece 
apparatus 110 of the client terminal issues a command {DE£ 
rtsp: //s3 .com/mov.mpg) C2a rec[uesting specific informatior 
relating to the media data corresponding to the second foj 
image (mov) (e.g., coding condition, existence of plural 
candidate data, etc), to the third server (s3.com) 13c- 

On receipt of the command C2a, the third server 13c 
an acknowledge (RTSP/l.O OK) R2a indicating that the comm4hd 
been accepted, to the client .terminal, and transmits SDP 
Description Protocol) information ' to the client terminal 

Next, the data reception apparatus 110 of the client 
terminal issues a setup request command (SETUP rtsp://s3. 
adv.mpg) C3a which requests the third server (s3.com) 13c 
up provision of the media data corresponding to the second 
foreground image (mov) , to the third server 13c. Upon 
of setup for the media data, the third server 13c issues 
acknowledge (RTSP/1.0 OK) R3 indicating that the command 
been accepted, to the client terminal. 

Thereafter, the data reception apparatus 110 of the 
terminal issues a command (DESCRIBE rtsp: //s2 ,com/adv.mpg 
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requesting specific information relating to the media datd 
corresponding to the first foreground image (adv) (e.g., diding 
condition, existence of plural candidate data/ etc.)/ to tjAe 
second server (s2-Cora) 13b, 

On receipt of the command C2h, the second server 13bjjissues 
an acknowledge {RTSP/1.0 OK) R2b indicating that the cornmajiid C2b 
has been accepted, to the client terminal, and transrrdts stbP 
(Session Description Protocol) information to the client tjirminal, 

Is'ext, the data reception apparatus 110 of the client 
terminal issues a setup request command (SETUP rtsp : //s3 , dim/ 
■ adv.mpg) C3b which requests the second server (s2.com) 13tj|to set 
up provision of the media data corresponding to the first 
foreground image (adv), to the ^second server 13b, Upon 
completion of setup for the media data/ the second server jl3b 
issues an acknowledge (RTSP/1.0 OK) R3b indicating that tl 
command C3b has been accepted, to the client terminal. 

When the time of the clock reaches the set time stor4|d in 
the signal generation unit 104, the signal generation unid 104 
generates a trigger signal St, and outputs it to the contrbl data 
memory 103- Since the set times stored in the signal gendtation 
unit 104 are -5, -2, 0, 5, and. 10 sec, the signal generatjjon 
unit 104 outputs a trigger signal every time the clock tin 
reaches -5, -2, 0, 5, and 10 sec. Upon reception of ever^ 
trigger signal, the control data recording unit 103 issueslla 
control command included in the event on the time table, di the 
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corresponding control target. 

First of all, when a trigger signal outputted from t 
signal generation unit 104 at time t (=-5sec*) is input t 
control ciata recording unit 103, the control data recordi 
103 outputs a control command (PLAY rtsp : //s3 . com/inov^iupg) 
the first event on the time table, to the ddLa request /re 
unit 105 as a target of this control command. 

The data request/reception unit 105 outputs a messag 
RTSP for requesting the image data (mov.mpg) , to the thir 
(//sS.com) 13c, on the basis of the control command (pt.ay 
^ ' rtsp: //s3 -com/mov.mpg) C4a trom the control data recordin 
103. 

On receipt of the message^ from the data request/rece 
unit 105, the third server 13c transmits' the image data ( 
by RTP to the data reception apparatus 110. 

The image data (mov.mpg) Dm2 transmitted from the se 
is received by the media data reception unit 10Gb. The i 
data Dm2 is a bit stream which is' compressively coded by 
method based on MPEG standard or the like- The bit stre 
data) inputted to the media data reception unit 106b is o 
the decoding unit 107b frame by frame. In the decoding u 
the bit stream is decoded frame by frame. The decoded i: 
Dd2 obtained in the decoding unit 107b is stored in the f 
memory 108b frame by frame. 

When a trigger signal outputted from the signal gene 
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unit 104 at. time t (--2sec.) is input to the control data 
recording unit 103, the control data recording unit 103 
control command (PLAY rtsp : //s2 .com/adv,iupg) C4b of the s 
event on the time table, to the data request /reception unii: 
as a target of this control command. 

The data request/reception unit 105 outputs a mesaag 
RTSP for requesting the image data (adv.mpg) Dml, to the 
server {s2,com) 13b, on the basis of the control command 
rtsp: //s2 . com/adv,mpg) C4b from the control data recording 
103. 

On receipt of the message from the data 
unit 105, the second server 13b transmits the image data 
(adv.mpg) Dml by RTF to the data reception apparatus 110, 

The image data (adv.mpg) Dml transmitted from the se 
is received by the media data reception unit 106a. The 
data Dml is a bit stream which is compressively coded by 
method based on MPEG standard: or the like. The bit strean 
data) inputted to the media data reception unit 10 6a is 
the decoding unit 107a frame by frame. In the decoding 
the bit stream is decoded frame by frame. The decoded 
Ddl obtained in the decoding unit 107a is stored in the fr 
memory 108a frame by frame. 

When a trigger signal outputted from the signal gene 
unit 104 at time t (=0sec.) is input to the control data 
recording unit 103, the control data recording unit 103 
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the time table, as control data Del, to the display unit 
target of this control command. The display unit. 109 dis] 
Lhe background image (bg) over the image space, according tlo the 
control command (bg//width300/height200) from the control klata 
recording unit 103, The data of the background image is i|4tained 
by the data reception apparatus 110 in advance. At this d<bint of 
time (t=Osec,)/ the display start times of the first and sycond 
foreground images indicated by the begin attributes of the video 
elements 204a and 205a, respectively, are larger than 0 seb. and, 
■ thereafter, the first foreground image (adv) and the secont 
foreground image (mov) are not displayed in the adv regiod (first 
foreground display region) 1120 and the mov region (second 
foreground display region) 1130, respectively. 

When a trigger signal outputted from the signal genefbtion 
unit 104 at time t (»5sec.) is input to the control data 
recording unit 103, the contr^ol data recording unit 103 cuUputs a 
control command {adv//ieft0/topl50/width300/height50) of tHe 
fourth event on the time table, as control data Del, to th 
display unit l09 ais a target of this control command. In hthe 
display unit 109, the decoded image data Dd2 is read frame! by 
frame, from the frame memory 108a, on the basis of the conj 
command (adv//left0/topl50/width300/height50) from the con 
data recording unit 103, and the first forftground image (a| 
combined with the background image such that it is placed 
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adv region (first foreground display region) 1120 in the ifjtage 
space 1100. 

Further, when a trigger signal St outputted froin the 
generation unit 104 at time t (=^10sec.) is input to the 
data recording unit 103^ the control data recording unit 
outputs a control command (mov//lef t50/top0/width200/heigh|t^l50) 
of the fifth event on the time table, as control data Del, 
display unit 109 as a target of this control command- Tn 
display unit 109, the decoded image data Dd2 is read frame 
frame, from the frame memory 108a, on the basis of the 
command (mov//lef t50/topa/wldth200/heightl50) from the 
data recording unit 103, and the second foreground image 
combined with the background image and the first foregrour i 
such that it is placed on the mov region (second foregrourj^ 
display region) 1130 in the image "space 1100- 

Figure 6 is a flowchart illustrating a specific proc 
calculating the time to issu^ a control command requestinc 
data in the control data recording unit 103. Hereinafter 
calculation process will be described briefly • In the 
shown in figure 6, the first set time TllnJ is the time tc 
a control command requesting media data corresponding to t 
video element in the scene description SDl (hereinafter re 
to simply as media data request time) t and the second set 
T2[n] is the time to display media data corresponding to 1 
video element. 
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Furthermore, figure 6 Illustrates a process of calcu 
the media data request time Tl[n] by introducing/ in addit[:. 
the prebuffering time, the time C required from when the 
command requesting the media data is issued to the server 
the client receives the media data. 

First of all, in the control data recording unit 103 
first internal variable n used for time calculation is set 
zero (step S501) . The variable n increases by 1 every 
time calculation on a video element in the scene descriptij<{) 
is completed - 

Next, a video element to be subjected to the time 
calculation process (target video element) is decided on 
basis of the analysis data Ddl ^from the SMIL request/r 
unit 102 (step S502) . Usually, a target video element is 
successively selected from the head of plural video elemer 
which are arranged in predetermined order in the scene 
description SDl, Therefore/^ the video element 204a is se 
first, between the video elements 204a and 205a. 

Subsequently, in the control data recording unit 103 
value "7" of the prebuffering attribute of the video e 
is set as the second internal variable P used for the time 
calculation process, and the value "5" of the begin 
the video element 204a is set as the third internal variat 
used for the time calculation process (step S503) „ 

Thereafter, in the control data recording unit 103, 
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first set Lime Tl[n] is calculated on the basis of the fo 
formula (1) (step S504) . 

Tirn]=B-P-C ... {1 

wherein C is a constant indicating the time required from 
the data request/reception unit 105 issues a control comma 
requesting media data to when the daLa reception unit rece 
the media data, and the value of the constant C is set by 
predicting the time from the request control command issue 
to the data reception time. In this first embodiment, th 
constant C is set at 0 sec. 

Accordingly, when b, 7, 0 (sec.) based on the scene 
start time (Osec.) are assigned to the variables B, P, C i 
formula (1), respectively, Lhe ^first set time T1[0] corres 
to the first video element 20,4a becomes* -2, and the time t 
the control command requesting the media data of the first 
foreground image (adv) is two seconds before the scene dis 

start time (Osec). 

Further, in the control data recording unit 103, the 
set time T2[n] is calculated on the basis of the following 
formula (2) (atep S505) . 

T2 [nl=B - - - (2) 

As the result, the second set time T2[0] correspond! 
the first video element 204a becomes 5, and the time to di 
the first foreground image (adv) is five seconds before th 
display start time (Osec.). 
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Thereafter, in the control data recording unit 103, 
decided whether or not the first and second set tinies have 
calculated for all of the video elements shown in the s 
description SDl (step S506) . When the first and second 3€ 
have already been calculated for all of the video elements 
first and second set times Tl [n] and T2[n] (n=0,l) of the 
respective video elements and the scene display start time 
are entered in the time field of the time table (step S508 

On the other hand, when the first and second set tim 
not yet been calculated, the value of the variable n is 
incremented by 1 (step 5507) f and the processess in steps 
S506 are repeated. 

In the data reception apparatus 110, when calculatio 

first and second set times for the videTo element 204a has 

** . 

completed, calculation of the set"" times for the video 
205a is not completed yet. Therefore, the value of the 
n is incremented by 1 (step '5507), and the processes in si 
S302 to 350 6 are performed on the video element 205. 

When calculating the first and second set times for 
video element 205ar 10, 15, and 0 (sec) based on the 
display start time (Osec.) are assigned to the variables 
and C in formula (1), respectively- As the result, the f 
time Tl[l] of the second video element 205a becomes -5, 
time to issue the control command requesting the media da 
the second foreground image (mov) is five seconds before 
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scene display sr.art time (Osec,)- Further, the second set! time 
T2[l] of" the second video element 205a becomes 10, and the! time 
to display the second foreground image (mov) is ten second 
before the scene display start time (Osec), 

At this point of time, calculation of the first and ^^cond 
set times for all of the video elements has been completed 
Therefore, the first and second set times Tl[0] and T2[01 Hf the 
video element 204a, the first and second set times Tl[l] and 
T2[l] of the video element 205a, and the scene display staUt time 
Tab are entered in the time field of the time table (step H508) . 

That is, in the time field of the time table, -5, -2 J 0, 5, 
and 10 sec, are entered in this order as time information |:jf 
control commands . 

As described above, the . data reception apparatus 110 jof the 
first embodiment is provided with "the SMIL request/rccepti 
102 which requests the server 13a to transmit the SMIL dat 
as data indicating the scene cjescription SDl for combining 
first and second toreground images (adv) and (mov) with th 
background image (bg) to display the composite image, and 
receives the SMIL ddLa Dal from the server 13a; rhe data 
request/reception unit 105 which requests the servers I3b 
to transmit the media data Dml and Dm2 of the respective- 
foreground images, and receives the messages from the serv 
and the control data generation unit 110a which controls t 
request/reception unit 105 so that the media data request 
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messages are issued to the corresponding servers at times Whe 
latency times earlier than the display start times of the 
respective foreground images^ on the basis of the informat .on 
indicating the latency times before starting display of th 
respective foreground images included in the SMIL data Dsl 
Therefore, each foreground image caii be combined with the 
background image to display a composite image, at the time 
designated by the scene description - 

Further, by setting the latency time at a sufficient 
value considering the condition of the network through wh 
media data is transmitted (e.g., band width, congestion, 
playback of the media data by the data reception apparatus 
hardly ctllecLed by jitter in th^ network, thereby prevent, 
image display from being interrupted during playback of t 
data . 

Furthermore, in the data reception apparatus 110 of 
first embodiment, the control data recording unit 103 
time to request media data of each* foreground image trom 
server and the time to display each foreground image, wit^ 
reference to the time table containing infurmaLion cibuuL 
times, on the basis of the SMIL data. Further, the contro 
recording unit 103 issues a control command to instruct 
request unit to make a request for media data or a control 
command to instruct the display unit to start display of 
data, every time the clock time in the reception apparatu 
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reaches the time described on the time table. Therefore, 4ven 
when the number of foreground images constituting the compbsite 
image is increased, comparison of the clock time with the Wime 
informat-ion described on the time table permits the data rUquest 
unit to make a request for each media data at an appropriahje time 
before starting display of each foreground image, whereby tlhe 
foreground image is display satisfactorily. 

In this first embodiment, the control data recording I unit 
103 calculates the time to issue a control command, with tlhe 
delay time C being set at 0, which delay time is required Ifrom 
■ when the unit 103 issues a control command requesLiny mediU data 
to the server to when the media data is received. HoweveiL this 
delay time C may be set at an arbitrary number larger thad 0 
according to the type of the network (e-.-g., a network incljuding a 

I 

radio communication line, or a network comprising only a vlired 
communication line) . 

While in this first embodiment the data reception ap 
receives video data as media data',' the media data is not 
restricted to video data, and it may be text data, audio 
the like. Also in this case, the same effects as mention 
are achieved. 

While in this first embodiment the video data .«3uppli 
the data reception apparatus have been compressively code 
MPEG, the video data may have been compressively coded by 
coding methods, such as JPEG (Joint Photographic coding e 
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Group), GIF (Graphics Interchange Format), H.261, H.263, 
like. 

While in this first embodiment the scene description 
designates RTSP as a transinission protocol for making a 
request, the scene description data may designate other 
such as HTTP (Hyper Text Transfer Protocol) and the like 

Furthermore, in this first einbodiment; the control 
recording unit 103 calculates the time to issue a control 
to the data request unit or the display unit, and the si 
generation unit 104 sets the control command issue time 
calculated JDy the unit lO'J as a trigger generation time, 
outputs a trigger signal to the control data recording uni 
every time the clock time in the signal generation unit 10 

reaches the set trigger generation time\- However, the 

'I 

issue a control command to the data request/reception unit 
display unit may be calculated by the signal generation ur 
In this case, the control data recording unit 103 must mar 

respective control commands according to their issue times 
While in this first embodiment the data reception ap; 
calculates the time to request media data from the server 
using the prebuffering attribute value which is attached 
video clement and indicates the latency time, the data r 
apparatus may calculate the media data request time by us 
request attribute value which indicates the time to output 
request message to the server , 
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[Embodiment 2] 

Figure 7 is a block diagram for explaining a data 
apparatus 120 according to a second embodiment of the 
invention - 

The data reception apparatus 120 of this second 
employs, as scene description data, SMIL data Ds2 which is 
different from the SMIL data Dsl used for the first 
and the apparatus 120 is provided with a control data recoji- 
unit 120a for generating control data Del and Dc2 on the ht, 
the SMIL data Ds2, instead of the control data generation 
liOa of the first embodiment for generating control data 
Dc2 on the basis of the SMIL data Dsl. 

Figure S is a diagram illustrating the contents of t! 
data Ds2 (scene description pS2) supplied as scene descripjlj; 

data to the data reception apparatus 120. 

•* 

The SMIL data Ds2 includes a request attribute value 
indicating the time to output a data request message to thj^ 
server, instead of the prebuffering attribute in the SMIL 
Dsl, That is, in the SMIL data Ds2, a video element 601a 
region attribute (region-"-23" ) indicating that a request 
for image data (adv.mpg) is output two seconds before 
scene display, instead of the prebuffering attribute 
{prebuf fering='*7s") possessed by the video element 201a of 
SMIL data Dsl. Further, in the SMIL data Ds2, a video 
602a has a region attribute (region=^"-5s") indicating that 
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request message for image data (mov.mpg) is output five se::foncls 
before starting scene display, instead of the prebuffering 
attribute (prebuf fering="15s"> possessed by the video elenj^nt 
202a of the SMIL data Dsl. 

In figure 8, a row 601 including the video element 60 
corresponds to the row 210 including the video element 201 fi 
the scene description SDl of the first embodiment, and a 
including the video element 602a corresponds to the row 20 
including the video element 202a in the scene description 
the first embodiment. 

In the data reception apparatus 120 ot tnis second 
embodiment, a control data generation unit 120a comprises 
control data recording unit 123, and a trigger signal gendj: 

A- 

unit 124. The control data recording unit 123 creates a 
table Tab shown in figure 4, outputs the control data Del 
and outputs time information It, on the basis of SMIL ana, 



£^D1 of 



data Da2 obtained by analyzing the SMIL data Ds2- The tri gg 
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signal generation unit 124 is identical in construction to 
trigger signal generation "unit 104 included in the data 
apparatus 110 of the first embodiment. 

Also in the data reception apparatus 120, like in 
reception apparatus 110 of the first embodiment, a reques 
message for each media data is issued to a specific server 
time a predetermined period earlier than the display start 
of each foreground image, on the basis of the SMIL data Dsl 
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supplied from the server, whereby the foreground image cad be 
combined with the background image for display at the timi 
designated by the scene description- 

m this second embodiment, a media data request message is 
issued at the timing when the clock time in the data recedtion 
apparatus reaches Lhe time designated by the request attribute in 
the SMIL data Ds2, However, as described for the first 
embodiment, the media data request message may be issued 
considering the constant C which is the time required froirjlwhen 
the medis data request message is transmitted to the serveht to 
when the media data is received, that is, the request mess|^ge may 
be issued at a time by the constant C earlier than the ti 
designated by the request attribute •^^^ 

Further, while in the first and second embodiments tjJe 
attribute indicating the latency time before starting disdlay of 
each foreground image is called "prebuf fering attribute" ablid the 
attribute indicating the tim^ to issue a media data regime.' 
message is called "request attribute", these attributes majj be 
called in other names so long as the meanings are the sam( 
While in the first embodiment the latency time from 
image data of an image is requested to the server to when 
of the image is started is determined on the basis of the liMIL 
data, the latency time may be determined on the basis of cj<|)ntroI 
data other than the SMIL data. For example, when input te 
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data is data (bit stream) which has been encoded by MPEG doding. 
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the latency time may be set on the basis of VBV (video Bufi: 
Verifier) delay information which is multiplexed in a header of 
each frame of a video bit stream, such that the latency tlHie is 
longer than the delay time indicated by this intormation . In 
this case/ the following effects are achieved - 

In a video decoder receiving a biL sLreani Lidndini t LejJl at a 
constant transmission rate, since video data varies frame toy 
frame, the latency time from when a bit stream is received 
when the bit stream is decoded varies frame by frame. The 
delay valne mul r.i p1 exed in the header of each frame of t.he 
bit stream shows this delay time* Therefore ^ by starting 
decoding on the video bit stream when the time indicated b 
VBV delay value lias passed after reception of the video da t 
buffer of the decoder is prevented from -underflow or overflXow 
However, since the information inciicating the VBV delay va 
multiplexed in the bit stream itself, it is impossible to 
the VBV delay value in advanbe of reception of the bit st 

While in the first and secotid embodiments the data reception 
unit 106a (106b) outputs one frame of media data to the de eroding 
unit 107a (107b) every time it receives one frame of iuedi<3 data, 
the construction of the data reception unit is not restricjted 
thereto . 

For example, the data reception unit 10 Sa (106b) may have a 
memory to hold the received media data, and it may read tli^ media 
data from the memory to output it to the decoding unit 10 
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predetermined 



(107b) at the time when the clock time in the data receptipn 
apparatus reaches the display start time indicated by the ffegin 
attribute in the SMIL data- Alternatively, the data rece^^txon 
unit 10 6a (10 6b) may have a memory to hold the received i 
data, and it may read the media data from the memory to out;put it 
to the decoding unit 107a (107b) at the time when the clock time 
in the data reception apparatus reaches a time a 
period (e.g., one second) before the display start time 
by the begin attribute in the SMIL data. 

In the above-described construction which starts 
the received media data at the display start time indicate^l 
the begin attribute, however, since decoding of the media 
started at the media data display start time, there is the 
possibility that the decoded. image data -is not stored in 
frame memory by a predetermined time, depending on the 
performance of the decoding unit. 

While in the first and'^second embodiments the SMIL dfelta 
received as control data, the control data is not re 
the SMIL data. For example, the control data may be any 
following: XHTML (ExLejibible Hyper Text Markup Language) 6^ 
by W3C, HTML (Hyper Text Markup Language) + TIME (Timed 
Interactive Multimedia Extensions) , SDP (Session De script ii<|>n 
Protocol) defined by IETF (Internet Engineering Task Forc^j , and 
BIFS (Binary Format for scene) defined by MPEG standard- 
Further, while in the first and second embodiments tfte data 
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reception apparatus is implemented by hardware, it may be 
implemented by software. 

For example, in the data reception apparatus 110, th 
request/reception unit 102, the signal generation unit 104 
data request/reception unit 105, the media data reception 
10 6a and 106b, the decoding units 107a and 107b, and the 
unit 109 can be implemented in a computer system using a 
program in which the functions of these units are progra 
as to be performed by a CPU (Central Processing Unit) , 

Even when the data reception apparatus 110 of the fi 
■ .embodiment is implemented by software, the same effects as| 
described for the first embodiment are achieved. 

The above-described software program can be stored i 
Storage media, such as a floppy disk, an optical disk, an 
a ROM cassette, and the like. 

In the first and second embodiments, a server (data 
transmission apparatus) corrtesponding to a receiving ter 
(client terminal) having a data reception apparatus which 
receives media data and control data such as SMIL data, t 
the SMIL data including information indicating the labenc 
before display of media data (prebuf f ering attribute) and 
information indicating the time to recjuest media data fro 
server (request attribute), to the receiving terminal. H 
the data transmission apparatus may transmits control dat 
than the SMIL data, including the information indicating 
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latency time and the information indicating the data request time 

For example, a server (data transmission apparatus) 
corresponding to a receiving terminal which combines plur 
nieces of media data for display, transmits control data slkch as 
response data to the request from the receiving terminal, Which 
control dara includes the prebuffering attribute value, t 
request attribute value, or an attribute value equivalent 
before transmission of media data, and then transmits me 
according to a data request message from the receiving te 
Also in this case, the receiving terminal can make a reque 
' media data at an appropriate data request time- 

Hefeinafter, a description will be given of an examp 
data exchcinye between a data reception apparatus included 
receiving terminal and a data transmission apparatus whic 
transmits the information indicating the latency time, in 
in control data other than the SMIL data. 

Figure 9 is a diagram jMlustrating an example of dat 
exchange between a data transmission apparatus (server) t 
transmitting media data and a data reception apparatus, i: 
case where the data transmission apparatus transmits the 
information indicating the prebuffering time (latency tim 

included in SDP. 

In figure 9, each of a second server (data transmiss 
apparatus) 23b transmitting the media data of the first 
foreground image and a third server (data transmission apdiratus) 
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23c transmitting the media data of the second foreground i 
transmits SDP including the prebuffering time (latency tinH 
first server (data transmission apparatus) 23a is identica 
construction to the first server 13a shown in figure 14 
reception apparatus 130a is mounted on a personal computer 
client terminal, and is supplied with SMIL data Ds 
scene description SD shown in figure 12 • The constructior 
data reception apparatus 130a is identical to the c 
data reception apparatus 901 (refer to figure 13) except 
request/reception unit 908. That is, in the data recepti 
apparatus 130a, the data request/reception obtains the 
information indicating the prebuffering time (latency time 
supplies it to the control data generation unit 907, 

When the user, who is viewing a home page described 
[Hyper Text Markup Language) using a Web browser installec 
personal computer, clicks a region on the home page linked 
predetermined SMIL data, the^ data reception apparatus 130£ 
client terminal issues an SMIL request command (GET 
http://sl.com/scene.smil) CI requesting the SMIL data Ds. 
command Cl requests the first server (sitcom) '23a Lo dist 
the SMIL data by HTTP. 

On receipt of the SMIL request command Cl, the se 
issues an acknowledge (HTTP/1.0 OK) Rl indicating that the 
command has been accepted, to the client terminal / and tr^jhsmits 
the SMIL data (scene, sml) Ds to the client terminal. 
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In the data reception apparatus 130a of the client t 
the SMIL request/reception unit 906 receives the SMIL data 
and analyzes the SMIL data Ds. 

The SMIL analysis data Da obtained by the analysis ojj 
SMIL data is stored in the control data generation unit 90 
Thereafter, the data reception apparatus 130a issues 
command (DESCRIBE rtsp: //s3 .com/mov.mpg) C2a requesting 
information relating to the media data corresponding to 
second foreground image (mov) (e.g», coding condition, 
of plural candidate data, etc.)/ t.o the third server (s3. 

On receipt of the command C2a, the third server 23c 
an acknowledge R20a indicating that the command has been 
ro the client terminal. This acknowledge R20a includes ar 
message (RTSP/1.0 OK) 21a indicating tliat the DESCRIBE c 
C2a has been accepted, and SDP (Session Description Protoc 
information R22a. The SDP information R22a includes 
[^^^^x^^owff^rxnq'Ah^) inforWtion, in addition to informafi 
required for decoding of media dat:a at the receiving 
information required for transmission of media data. In 
information R22a, indicates version information rel 

the construction of the SDP, and "m-video" indicates that 
information relating to video data is described after the 
"m-video". The (a-prebuf fering : ISs) information indicatei 
the latency time from requesting the media data of the 
image (mov) to displaying the foreground image is 15 secor 
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Next, the data reception apparatus 130a of the clien 
terminal issues a setup request command (SETUP rtsp://s3. 
mov.mpg) C3a which requests the third server (s3.com) 23c 
up provision of the media data corresponding to the second 
foreground image (mov) , to the third server 23c. Upon 
of setup for the media data, the third server 23c issues 
acknowledge (RTSP/1.0 OK) R3a indicating that the command 
been accepted, to the client terminal. 

Subsequently, the data reception apparatus 130a issu<i 
command (DESCRIBE rtsp: //s2 .com/adv.mpg) C2b requesting 
• information relating to the media data corresponding to 
foreground image (adv) (e.g., coding condition, existence 
plural candidate data, etc.), to the ^second server (s2.co« 
On receipt of the command C2b, the second server 23b 
an acknowledge R20b indicating that the command has been 
to the client terminal. This acknowledge R20b includes 
message (RTSP/1.0 OK) 21b inclicating that the DESCRIBE 
C2a has been accepted, and SDP (Session Description Pr 
information R22b- The SDP information R22b includes 
(a-prebuffering:7s) information as well as (V-0) informati 
(m-video) information. The {a=prebuf fering:7s) informat 
indicates that the latency time from requesting the media 
(adv.mpg) of the first foreground image (adv) to displ 
foreground Image is 7 seconds. 

Next, the data reception apparatus 13Ua of tne clien 
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terminal issues a setup request command (SETUP rtsp://s2- 
adv.mpg) C3b which requests the second server (s2.com) 23b 
up provision of the media data corresponding to the first 
foreground image (adv), to the second server 23b. Upon 
completion of setup for the media data, the second server 
issues an acknowledge (RTSP/1.0 OK) R3b indicating that 
command C3b has been accepted, to the client terminal. 

Thereafter, the data reception apparatus 130a of the 
terminal issues a data request command (PLAY rtsp: //s3-coi!j> 
mov.mpg) C4a req^^esting the media data (mov.mpg) corres 
-the second foreground image (mov) , to the third server (s 
23c, fifteen seconds before the display start time of the 
foreground image (five seconds^ before Lhe display start 
the entire scene). On receipt of this' command C4a, the tl 
server 23c issues an acknowledge (RTSP/1.0 OK) R4a indicat 
that the command C4a has been accepted, to the client tent 
Thereafter, the third server"^ 23c transmits the media data 
corresponding to the second foreground image {mov,mpg) , wti 
media data is stored in RTP packets, to the client terminc 

Further, the data reception apparatus 130a of the 
terminal issues a data request command (PLAY rtsp: //s2,con 
adv.mpg) C4b requesting the media data (adv.mpg) corre 
the first foreground image (adv), to the second server (s 
23b, seven seconds before the display start time of the fi 
foreground image (two seconds before the display start 



::3b 



client 



tic\e 



/ 

to set 



to 

com) 
econd 

of 
Lrd 
:,ng 
:.nal. 
I)m2 
:.cn 



cl Lent 



spore iing to 
com) 
tst 
tirt^ of 



75 



the entire scene) , On receipt of this command C4b, the s 
server 23b issues an acknowledge (RTSP/1.0 OK) R4b indica 
that the command C4b has been accepted, to the client te 
Thereafter, the second server 23b transmits the media dat 
corresponding to the first foreground image (adv.mpg) , whi 
media data is stored in RTF packets, to the client termin 
Thereafter, the respective media data are output to 
display unit 905 at the display start times to be display 
the basis of the result of analysis performed on the SMIL 
The above-described method of transmitting the infor 
■which indicates the latency time (prebuf f ering time) and i 
included in the control data (SDP data) other than SMIL d 
from the data Lrauaiiiibsion apparatus to the data receptio: 
apparatus, is very effective, for conterits whose initial d 
time (i.e., latency time from' requesting media data to th 
to starting display of the media data) varies in real tim* 
video of a concert which is broadcast live) . 

Figure 10 is a diagram illuVtrating an example of da 
exchange between a data transmission apparatus (server) f 
transmitting media data and a data reception apparatus, i 
case where the data transmission apparatus transmits the 
information indicating the prebuffering time (latency tim. 
which information is included in an acknowledge to a SETD 

request of FTSP. 

in figure 10, each of a second server (data transmis 
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apparatus) 33b transmitting the media data of the first 
foreground image and a third server (data transmission app 
33c transmitting the media data of the second foreground 
transmits the information indicating prebuffering time (la : 
time), included in an acknowledge to a SETUP request of RT I 
first server (data transmission apparatus) 33a is identica . 
construction to the first server 13a shown in figure 14 
reception apparatus 130b is mounted on a personal computer 
client terminal, and is supplied with SMIL data SD shown : 
figure 12 as scene description data SD. The construction 
data reception apparatus 130b is identical to the 
data reception apparatus 901 (refer to figure 13) except t 
request/reception uniL 908. TJiat is^, in the data rec 
apparatus 130b, the data request/reception obtains the 
information indicating the prebuffering time (latency time 
supplies it to the control data generation unit 907. 

When the user, who is viewing a home page described 
(Hyper Text Markup Language) using a Web browser installec 
personal computer, clicks a region on the home page linkec 
predetermined SMIL data, Lhe data reception apparatus 13 
client terminal issues an SMIL request consmand (GET http 
si. com/scene, smil) CI requesting the SMIL data Ds . This 
CI requests the first server (sl.com) 33a to distribute t 

data by http. 

On receipt of the SMIL request comiaand Cl, the s 
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issues an acknowledge (HTTP/1,0 OK) Rl indicating that the 
command has been accepted, to the client terminal, and traf[^smits 
the SMIL data (scene, sml) Ds to the client terminal . 

In the data reception apparatus 130b of the client t 
the SMIL request/reception unit 906 receives the SMIL data 
and analyzes the SMIL data Ds • 

The SMIL analysis data Da obtained by the analysis 
SMIL data is stored in the control data generation unit SOfT . 

Thereafter, the data reception apparatus 130b issues 
command (DESCRIBE rtsp: //s3.com/mov.mpg) C2a requesting 
information relating to the media data corresponding to 
second foreground image (mov) (e.g., coding condition 
of plural candidate data, etc.), to the third server {$3. 

On receipt of the command C2a^ thfe- third server 33c 
an acknowledge R2a indicating that the command has been 
to the client terminal, and transmits SDP (Session Descrij 
Protocol) information to the"^ client terminal. 

Next, the data reception apparatus 130b of the clien 
terminal issues a setup request command (SETUP rtsp; //s3.c|<i)m/ 
mov.mpg) C3a which requests the third server '(s3.com.) 33c 
up provision of the media data corresponding to the seconc 
foreground image (mov), to the third server 33c, Upon 
of setup for the media data, the third server 33c issues 
acknowledge R30a indicating that the command C3a has been 
accepted, to the client terminal . 
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This acknowledge R30a includes an OK message {RTSP/1 
31a indicating that the SETUP command C3a has been accepte L 
additional information 32a. The additional information 3 
includes (a=prefciuf f ering: 15s) information, in addition to 
sequence number (CSeq:2) information, session number 
(Session: 12345678) information, and the like. The 
(a=-prebuf fering:15s) information indicates that the latency 
from requesting the media data of the foreground image { 
displaying the foreground image is 15 seconds. The sequer 
number (CSeq:2) is assigned to onetime message exchange 
the data transmission apparatus and the data reception 
and the same sequence number is assigned to an issue of a 
from the receiving terminal and to an acknowledge to the 
from the server. Accordingly, although it is not shown ii 
10, a sequence mamber (CSeq:l) is'' given to the command ( 
rtsp://s3.com/mov,mpg) C2a and to the acknowledge R2a- 
session number is assigned tb the state where data transmi 
is allowed, established between the data transmission app 
and the data reception apparatus. 

Subs«quenLly, the data reception apparatus 130b issu 
command (DESCRIBE rtsp://s2.com/adv,mpg) C2b requesting s 
information relating to the media data corresponding to 
foreground image (adv) (e.g., coding condition, existence 
plural candidate dat^a, etc.), to the second server (s2 

On receipt of the command C2b, the second server 33b 
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an acknowledge R2b indicating that the coinmand has been ad::epted, 
to the client terminal, and transmits SDP (Session Descript; 
Protocol) information to the client terminal. 

Next, the data reception apparatus 130b of the clien 
terminal issues a setup request command (SETUP rtsp://s2 
adv.mpg) C3b which requests the second server (s2.com) 33b 
up provision of the media data corresponding to the first 
foreground image (adv), to the second server 33b. Upon 
completion of setup for the media data, the second server 
issues an acknowledge R30b indicating that the command C3b 
■ been accepted, to the client terminal. 

This acknowledge R30b includes an OK message (RTSP/1 
31b indiciiLiny LhaL Uhe SETUP command C3b has been accept^cf 
additional information 32b. The additidnal information 32k 
includes (a-prebuf f ering j 7s ) information, in addition to 
number (CSeq:2) information, session number (Session: 1234 
information, and the like. The (a=prebuf f ering:7s) i 
indicates that the latency time ^rom requesting the media 
the foreground image (adv) to displaying the foreground 

7 ^seconds . 

Thereafter, the data reception apparatus 130b of the 
terminal issues a data request command (PLAY rtsp://s3. 
mov.mpg) C4a requesting the media data (mov.mpg) correspor|^ing 
the second foreground image (mov), to the third server (s 
33c, fifteen seconds loetore the display start time of the 
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foreground image (five seconds before the display start ti 
the entire scene) . On receipt of this command C4a, the t 
server 33c issues an acknowledge (RTSP/1.0 OK) R4a indica 
that the command C4a has been accepted, to the client te 
Thereafter, the third server 33c stores the media data 
corresponding to the second foreground image (mov.mpg) in 
packets, and transmits the media data, packet by packet, 

client terminal. 

Further, the data reception apparatus 130b of the cl 
terminal issues a data request command (PLAY rtsp: //s2-coi 
adv.mpg) C4b requesting the media data (adv.mpg) correspo: 
the first foreground image (adv) , to the second server (s 
33b, seven seconds before the -display start time of the fj| 
foreground image (two seconds before the display start ti: 
the entire scene) . On receipt o'f this corranand C4b, the s 
server 33b issues an acknowledge (RTSP/1.0 OK) R4b indica' 
that the command C4b has be^n accepted, to the client te 
Thereafter, the second server 33b' stores the media data 
corresponding to the first foreground image (adv.mpg) in 
packets, and transmits the media data, packet by packet, 

client terminal. 

Thereafter, the respective media data are output to 
display unit 905 at the display start times to be display 
the basis of the result of analysis performed on the SMIL 

The above -de scribed method of transmitting the info 
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which indicates the latency time (prebuf f ering time) and if; 
included in the control data {acknowledge of the server tol the 
SETUP request from the receiving terminal) other than SMIL data, 
from the data transmission apparatus to the data receptior: 
apparatus, is very effective for contents whose initial de^ay 
time (i.e., latency time from requesting media data to the server 
to starting display of the media data) varies in real time (e.g. 
video of a concert which is broadcast live) 



